docs(design): proposed txn secondary-commit idempotency (Jepsen 26198185540)#796
Conversation
…6198185540)
Triggered by the scheduled Jepsen run 26198185540 surfacing a
real :duplicate-elements anomaly on the Redis list-append
workload. The duplicate is a single client-level :append being
applied twice in storage with DIFFERENT commit_ts: the EXEC
body's retry loop (runTransaction's retryRedisWrite) regenerates
startTS / commitTS / list index on each iteration, so content-
addressing — which catches identical-input replay — does not
catch the regenerated-input retry-after-partial-success shape.
Design proposes a server-side per-txn applied marker keyed by an
adapter-allocated UUIDv7, written atomically with the commit
mutations and checked at FSM apply entry. TiKV's
CheckTxnStatus + content-addressed Write CF design is the
reference; this proposal lifts the same correctness guarantee
into the FSM layer because elastickv's RPUSH/LPUSH-derived
storage keys are NOT pure content-addressed (the seq number
depends on a meta read at attempt time).
Doc covers:
- Why the same-commit_ts replay does NOT duplicate
(content-addressed delta keys)
- Why the different-commit_ts replay DOES duplicate
(runTransaction's EXEC-body retry regenerates everything)
- Why applyCommitWithIdempotencyFallback does NOT catch this
(works only on identical-input replay)
- How TiKV's three layers (content-addressed writes +
CheckTxnStatus + LockResolver) compose, and which one
elastickv is missing
- Server-side marker design (key prefix, GC TTL, batch atomicity)
- Optional client-side commit-status check
- Six implementation milestones (M1 proto/UUID, M2 FSM marker,
M3 adapter sentinel handling, M4 GC, M5 optional check,
M6 Jepsen validation)
- Caller audit table covering all commit-request entry points
- Four risks (GC timing, proto overhead, reconstruction
intricacy, LockResolver interaction)
The doc is independent of PR #795 (full Jepsen log artifact);
the fix design works for any partial-commit code path that
might be the actual trigger.
|
Warning Review limit reached
More reviews will be available in 21 minutes. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (10)
📝 WalkthroughWalkthroughThis PR adds opt-in one-phase transaction idempotency dedup: store-level exact-commit probes, TxnMeta PrevCommitTS, coordinator threading, FSM probe-based no-op, and adapter list-push write-set reuse plus tests. ChangesOne-Phase Idempotency Dedup with PrevCommitTS
Sequence DiagramsequenceDiagram
participant Client
participant Adapter as RedisAdapter
participant Coordinator
participant FSM
participant Store
Client->>Adapter: RPUSH attempt 1
Adapter->>Coordinator: Dispatch(startTS)
Coordinator->>FSM: One-phase request (commitTS=T1, prevCommitTS=0)
FSM->>Store: Apply mutations -> commit at T1
FSM-->>Coordinator: Committed
Adapter-->>Client: Retry signalled
Adapter->>Adapter: Capture write set from attempt 1
Adapter->>Coordinator: Dispatch(retry, prevCommitTS=T1)
Coordinator->>FSM: One-phase request (commitTS=T2, prevCommitTS=T1)
FSM->>Store: CommittedVersionAt(primaryKey, T1)?
alt Found
FSM-->>Coordinator: no-op (nil)
Adapter->>Adapter: Return stored client-visible length
Adapter-->>Client: Result (no duplicate)
else Not found
FSM->>Store: Apply mutations at T2
FSM-->>Coordinator: Committed
Adapter-->>Client: Result (applied)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This design document proposes a server-side transaction-applied marker and an optional client-side check to ensure exactly-once application of multi-shard transaction commits, addressing a duplicate-element anomaly found during Jepsen testing. The proposed solution involves using a UUIDv7 as a transaction identifier to detect and handle retries that might otherwise result in duplicate writes. Review feedback highlights the need for the new error sentinel to include the committed timestamp to aid result reconstruction, suggests optimizing the marker garbage collection by leveraging the time-ordered nature of UUIDv7 with DeleteRange, and warns about the operational risks of adding synchronous reads to the secondary commit retry path.
| // Primary already committed. If our secondary's marker is | ||
| // present, this is a true no-op — return success. Otherwise | ||
| // retry the secondary commit (it really did fail). | ||
| if secondary_marker_present(adapter_txn_id) { return Ok } |
There was a problem hiding this comment.
The secondary_marker_present check in milestone M5 introduces a significant operational risk: a synchronous read from the secondary shard before every retry. If the secondary shard is experiencing issues (e.g., leader election or network partition), this check will likely fail or time out, delaying the retry loop and impacting availability. As per repository guidelines for operational risks, the design should detail mitigation strategies; a 'blind' retry might be more robust as it avoids the added latency and dependency in the failure path.
References
- When a design document identifies a significant operational risk, it must also detail potential mitigation strategies to ensure system stability.
| 2. If present and the recorded commit_ts equals the request's commit_ts: | ||
| no-op (return nil). The mutations would be a duplicate. | ||
| 3. If present and the recorded commit_ts **differs**: return | ||
| `ErrTxnAlreadyCommittedAtDifferentTS` (a new sentinel). This catches the |
There was a problem hiding this comment.
To facilitate the result reconstruction described in milestone M3, the ErrTxnAlreadyCommittedAtDifferentTS sentinel should carry the recorded commit_ts in its response payload. Since the adapter regenerates startTS on every retry (as noted in the walkthrough), it will not know the original startTS used by the successful attempt and thus cannot look up the primary's commit record via txnCommitKey(primary_key, original_start_ts). Returning the commit_ts directly from the FSM marker eliminates this lookup dependency and supports smoother rolling upgrades by reducing the need for consistent state lookups across different node versions.
References
- Design documents should address operational risks such as rolling upgrades by detailing strategies that ensure compatibility across different node versions.
|
|
||
| A conservative TTL of `2 × (lockResolverInterval + defaultTxnLockTTLms)` | ||
| = 80 s covers both. Cleanup runs in `kv/compactor.go`'s existing periodic | ||
| sweep — add a new prefix scan that drops markers whose `applied_at` is |
There was a problem hiding this comment.
Since adapter_txn_id is a UUIDv7, it is naturally time-ordered (the most significant 48 bits contain a millisecond timestamp). Instead of a prefix scan that reads every value to check applied_at_wall_time, the compactor can use a more efficient DeleteRange operation. By calculating the UUIDv7 prefix corresponding to the TTL cutoff, you can delete all expired markers in a single atomic operation without the O(N) read overhead during the periodic sweep.
|
@claude review |
|
Claude finished @bootjp's task in 3m 40s —— View job Design Review: Transaction Secondary-Commit Idempotency
SummaryThe root cause analysis is accurate and the proposed server-side marker is the right enforcement layer. The design is sound for the primary-commit case. The issues below range from a hard correctness gap in M3 to a missing backward-compatibility section, with several areas worth clarifying before implementation starts. Critical1. M3's adapter recovery reads The FSM already has the committed type ErrTxnAlreadyCommittedAtDifferentTS struct {
RecordedCommitTS uint64
}The adapter can then go directly to the actual state via 2. Rolling-upgrade compatibility: zero/absent During a rolling upgrade, nodes on old code will send This must be explicit in M1/M2: if Important3. Multi-shard retry safety — the design is correct but the reasoning needs to be made explicit The caller audit doesn't explain WHY a Attempt 2 can't partially commit on secondary shards before the primary check fires. The safety actually flows from the coordinator structure: This is the key invariant that makes the multi-shard case safe. It's worth one sentence in the design explicitly linking the marker check to the primary-first commit ordering. 4. FSM hot-path read overhead missing from risks R2 only calls out the 16-byte proto field. Every commit now also requires a Pebble point read ( This is probably acceptable (Pebble point reads from block cache are sub-microsecond), but it should be in the risks section so reviewers of M2 can benchmark against it and decide if a bloom filter or skip-list layer helps. 5. GC — replace prefix scan with The current M4 description calls for a prefix scan that reads every marker value to check The UUIDv7 cutoff calculation is straightforward: shift the cutoff epoch-ms into bits [127:80] of a 128-bit value, zero the rest. Any UUIDv7 with a timestamp before the cutoff will compare less than this value under lexicographic byte order. 6. M5 availability risk — needs an explicit mitigation (addresses Gemini's high-priority finding) The synchronous secondary-marker read in M5 runs on the already-degraded retry path. If the secondary shard is mid-leader-election, this read adds unbounded latency before the retry loop can proceed. Given that M5 is labeled "optional" the risk is bounded, but the doc should note: when implementing M5, gate the marker read on a short deadline (e.g. Minor7. Caller audit — LockResolver row needs a clarifying split The current row conflates two LockResolver sub-cases:
Both paths are handled correctly by the design, but spelling them out prevents the "same (start_ts, commit_ts)" reasoning from looking circular to implementers who will touch 8. TTL lower bound — check against gRPC deadline The TTL formula Strengths worth noting
Overall: approve the design direction, suggest addressing issues 1 and 2 before merging (M3 is blocked on 1; rolling upgrades are blocked on 2), and fold 5 and 8 into M4's scope. |
…26340084100) The full demo-cluster log from scheduled run 26340084100 (first failing run after PR #795 landed the full-log artifact) positively identifies the duplicate trigger. Findings: - Run reproduced the same :duplicate-elements ({153 2} on key 17, 154/155 lost) plus a downstream :G-single-item-realtime cycle (the append-checker's report of the corrupted version order the duplicate causes — not an independent fault). - elastickv-demo.log shows 20+ leadership changes in the ~3-min workload with repeated 'found conflict at index N' log-tail truncations. - At 18:21:48 (exact instant append 17 153 was processed) a 3-way vote storm was in flight: two peers at index 888 campaigning while the incumbent leader sat at index 890 with a valid lease. The two-entry divergence is the ambiguous-commit window. Confirms the doc's core hypothesis (retryRedisWrite regenerating commitTS defeats content-addressing) and pins the trigger to leadership churn rather than commitSecondaryWithRetry or the LockResolver specifically. The server-side UUID marker (M1-M4) fixes it regardless of which raft event produced the ambiguous commit. Also recorded the CI CPU-starvation amplifier (single-process 3-node demo + JVM on a 2-core runner → heartbeat delay → election storms) as a separate non-blocking follow-up, and added a resolved open question confirming the marker must also cover single-key Phase_NONE writes (listPushCore's own retryRedisWrite regenerates commitTS too).
…ejected Per review discussion: the observed :duplicate-elements is always an internal retry within a single retryRedisWrite loop, which TiKV's CheckTxnStatus pattern fully covers — check whether the previous attempt committed before re-running, and if so return its result instead of re-appending. This is lighter than the original UUID + FSM-marker + GC design: adapter-only, no proto wire field, no FSM dedup table, no periodic GC. The previous-attempt identity (startTS/commitTS/primaryKey) is held in a local across the retry loop; 'did it commit?' is answered by probing the primary txnCommitKey (multi-shard) or the prior attempt's own data version at commitTS_prev (single-shard one-phase, which writes no commit record). Rewrote: - Title/intro + 'How TiKV solves this' closing — name CheckTxnStatus as the missing layer (2) this design adds; note that RPUSH's data-dependent index is what defeats content-addressing (layer 1). - 'Proposed design' — CheckTxnStatus mechanism, why-UUID-rejected subsection, where the check lives (3 retry sites), caller audit showing the FSM apply path is UNCHANGED. - Milestones M1-M4 — identity plumbing, probe helpers, wire into 3 sites, validation. ~250 LOC, no proto/FSM/GC (was ~600 LOC + proto). - Risks R1-R4 — result reconstruction, probe read cost, false-negative under deep churn (mitigation = the rejected marker design, kept in git history), single-shard no-commit-record handling. - Open questions + decision log updated. The marker design is preserved in this file's git history should R3 (probe false-negative under deep churn) ever be observed.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: aa41e9fff6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| LockResolver specifically. The server-side marker (M1–M4) is the | ||
| correct fix because it deduplicates **regardless of which raft event | ||
| produced the ambiguous commit** — it keys on the adapter-allocated | ||
| txn UUID, which is stable across both the `retryRedisWrite` |
There was a problem hiding this comment.
Remove stale marker-based fix claim from resolved section
This paragraph says the "server-side marker" with an adapter txn UUID is the correct fix, but the same document’s stated design is a CheckTxnStatus probe and explicitly rejects the UUID/FSM marker approach. That contradiction can send implementation work down the wrong path (proto/FSM marker work the doc says to avoid), so this section should be updated to match the selected CheckTxnStatus design.
Useful? React with 👍 / 👎.
…y, option (a) fallback CheckTxnStatus-as-primary is retracted: the cluster is single-group, so every txn takes the one-phase path, which writes no lock and no commit record — CheckTxnStatus has nothing to read. Rewrite with option 2 (reuse the failed attempt's write set, dedup on an exact-commit_ts probe at one-phase apply) as primary and option (a) (give one-phase a lock + commit record so CheckTxnStatus applies) as the documented fallback. Add the central correctness argument head-on: new data is written only at a fresh monotonic commit_ts (SI preserved), the stale commit_ts is read-only in an exact-ts probe (dedup identity), and Raft log order makes the probe race-free (E1 applies before E2 or never). Because the argument holds, option 2 is primary.
…propose/apply path The race-freedom premise (attempt-1's entry applies before the retry's or never, never after) is confirmed: txn dispatch proposes with context.Background() and blocks until apply or until the proposal is failed-and-drained on a Leader->non-Leader transition (failPending); a drained proposal is never re-proposed, and fsm.Apply runs at the entry's own (lower) log index. Option 2 is cleared to implement.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8f2b800a9c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| - `applyRequests` (`kv/transaction.go:152`) proposes with | ||
| `context.Background()`, so `Engine.Propose` | ||
| (`internal/raftengine/etcd/engine.go:606`) blocks until the entry | ||
| applies or the proposal is *failed* — it never abandons on a caller | ||
| timeout (the `ctx.Done()` branch is dead under Background). |
There was a problem hiding this comment.
Re-validate ordering proof with cancellable txn context
This verified-linchpin section assumes applyRequests proposes with context.Background() and therefore cannot return on caller timeout, but the current single-shard txn path passes the caller context through (dispatchSingleShardTxn → Txn.Commit → applyRequests(ctx, ...)), and Engine.Propose explicitly has a ctx.Done() early-return path. That means the "dispatch returns only after apply or abandonment" premise is not established by this evidence, so the race-freedom argument for option 2 is currently built on an incorrect runtime assumption and should be re-proven (or constrained) for deadline/cancellation scenarios.
Useful? React with 👍 / 👎.
…ncy M2a) Option 2's one-phase idempotency dedup needs to ask 'did the previous attempt — which committed at exactly prev_commit_ts — land?'. Add CommittedVersionAt(ctx,key,commitTS) to MVCCStore: an exact-timestamp existence check (point lookup), distinct from GetAt's newest-version-<=ts scan. Because commit timestamps are unique and strictly monotonic (HLC Clock().Next()), a version at an exact commitTS on a key can only be the txn assigned that timestamp, so an exact hit unambiguously identifies that attempt — unlike the loose latestTS>=commitTS check in applyCommitWithIdempotencyFallback, which would misclassify a different txn's later commit as our own. - pebbleStore: point Get on encodeKey(key, commitTS); a tombstone counts as landed (the attempt committed a delete), so the value is not decoded. - mvccStore: binary search the TS-ascending version slice for an exact hit. - ShardStore / LeaderRoutedStore: delegate to the local group store; this is an apply-local probe, not a leader-routed client read (no fence/proxy). Tests run both store implementations through a shared suite: exact hit, absent-key miss, exactness (a version at 300 must NOT satisfy a probe at 200), tombstone-counts-as-landed, and cross-key isolation. Refs docs/design/2026_05_21_proposed_txn_secondary_idempotency.md (M2).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c1f282336d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| func (s *pebbleStore) CommittedVersionAt(_ context.Context, key []byte, commitTS uint64) (bool, error) { | ||
| s.dbMu.RLock() | ||
| defer s.dbMu.RUnlock() | ||
|
|
||
| _, closer, err := s.db.Get(encodeKey(key, commitTS)) |
There was a problem hiding this comment.
Return compaction error for stale exact-ts probes
This exact-timestamp probe bypasses the retention watermark check that GetAt/ExistsAt enforce, so when commitTS is below minRetainedTS it returns (false, nil) instead of ErrReadTSCompacted. In the idempotency-retry flow this makes "history compacted" indistinguishable from "previous attempt never landed", which can incorrectly trigger re-application after long delays or aggressive compaction. The method should apply the same readTSCompacted guard as other timestamped reads.
Useful? React with 👍 / 👎.
| func (s *mvccStore) CommittedVersionAt(_ context.Context, key []byte, commitTS uint64) (bool, error) { | ||
| s.mtx.RLock() | ||
| defer s.mtx.RUnlock() | ||
|
|
||
| v, ok := s.tree.Get(key) |
There was a problem hiding this comment.
Enforce retention guard in in-memory exact-ts lookup
The in-memory implementation has the same behavior gap: it reads historical versions directly without checking minRetainedTS, so probes for compacted timestamps silently return false rather than ErrReadTSCompacted. That breaks parity with the Pebble backend and can hide compacted-history conditions from callers that rely on exact-ts probes for dedup decisions.
Useful? React with 👍 / 👎.
Carry prev_commit_ts (a failed prior attempt's commit timestamp) into the single-shard one-phase apply and no-op the apply when that attempt already landed — preventing the :duplicate-elements anomaly without a commit record, lock, or extra Raft round (option 2). Wire format: prev_commit_ts rides in the existing TxnMeta blob as a new PrevCommitTS field + V2 flag 0x04 — NO proto change. EncodeTxnMeta emits V2 only when PrevCommitTS != 0, so every non-retry caller keeps the V1 default wire format. handleOnePhaseTxnRequest probes CommittedVersionAt(primaryKey, prev_commit_ts) and returns nil (no-op the whole apply) on an exact hit. Determinism (R5): the probe changes the apply outcome, so it must be computed identically on every node. It is — the decision depends only on the replicated prev_commit_ts and the deterministic exact-ts probe at the entry's apply point. A pre-flag node fails loud (unknown-flag rejection) rather than diverging; emission will be gated off by default (M3) until the whole cluster is probe-aware (ship-the-reader-before-the-writer). Tests: codec round-trips PrevCommitTS (V2) and keeps V1 when unset; FSM dedup no-ops when the prior attempt landed at exactly T1 (no version at the fresh T2, newest stays T1), applies at T2 when it did not land, and skips the probe entirely when prev_commit_ts == 0. Refs docs/design/2026_05_21_proposed_txn_secondary_idempotency.md (M2).
… idempotency M1+M3) Thread prev_commit_ts from the adapter to the one-phase apply and make list-push retries reuse the failed attempt's write set instead of recomputing seq — the recompute is what duplicates the element under leadership churn. - kv.OperationGroup gains PrevCommitTS, threaded through ShardedCoordinator.Dispatch -> dispatchTxn -> dispatchSingleShardTxn -> onePhaseTxnRequestWithPrevCommit. - listPushCore branches on RedisServer.onePhaseTxnDedup (default OFF; WithOnePhaseTxnDedup / ELASTICKV_REDIS_ONEPHASE_DEDUP). When on, listPushCoreWithDedup tracks a reusableListPush and, on a retryable error, reuses the exact write set under a fresh commit_ts with PrevCommitTS set so the FSM no-ops if the prior attempt landed. A WriteConflict from a reuse means the probe ruled out our own attempt and the seq is genuinely taken, so it recomputes from a fresh meta. R1 (result reconstruction) resolved and far simpler than feared: because the retry reuses the write set built from attempt 1's meta, the post-push length is invariant across reuse, so the adapter just returns the remembered length — no store re-read. Gating keeps FSM determinism (R5): off by default, so no prev_commit_ts is emitted and the FSM is byte-identical to today until the whole cluster is probe-aware. Tests (real OCC + real exact-ts probe): landed-then-ambiguous -> dedup no-op -> no duplicate, correct length; not-landed -> reuse applies; genuine cross-txn conflict -> recompute to a new seq (two distinct elements); gate-off keeps the legacy path. Refs docs/design/2026_05_21_proposed_txn_secondary_idempotency.md (M1, M3).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d3e97050b3
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
|
|
||
| func onePhaseTxnRequest(startTS, commitTS uint64, primaryKey []byte, reqs []*Elem[OP], readKeys [][]byte) *pb.Request { | ||
| return onePhaseTxnRequestWithPrevCommit(startTS, commitTS, 0, primaryKey, reqs, readKeys) |
There was a problem hiding this comment.
Preserve PrevCommitTS when building one-phase txn logs
onePhaseTxnRequest now hardcodes prevCommitTS=0, so any retry carrying OperationGroup.PrevCommitTS loses the dedup probe key on the Coordinate path (including leader redirects that call buildRedirectRequests -> onePhaseTxnRequest). In that environment, an ambiguous first commit can be retried without PrevCommitTS, causing the FSM to re-apply instead of no-op and reintroducing duplicate list mutations that this change is meant to prevent.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (6)
store/committed_version_at_test.go (2)
19-77: ⚡ Quick winPrefer a table-driven suite for these subcases.
The cases are currently repeated
t.Runblocks; converting to a table will keep this easier to extend as new exact-ts cases are added for both backends.As per coding guidelines
**/*_test.go: "prefer table-driven test cases".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@store/committed_version_at_test.go` around lines 19 - 77, Replace the multiple t.Run subtests in TestCommittedVersionAt with a single table-driven loop: create a slice of test cases (fields: name, setup ops as functions or a list describing calls to newStore/PutAt/DeleteAt, probeKey, probeTS, wantBool) and iterate calling t.Run(tc.name, func(t *testing.T){ st := newStore(t); perform tc.setup using st and ctx (use PutAt/DeleteAt as needed); got, err := st.CommittedVersionAt(ctx, tc.probeKey, tc.probeTS); require.NoError(t, err); require.Equal(t, tc.wantBool, got) }). Keep references to CommittedVersionAt, PutAt, DeleteAt, newStore and ctx so each case reproduces the original behaviors (exact hits, absent key, newer version, tombstone, different key).
57-77: ⚡ Quick winAdd one TTL/HLC-focused case for exact-ts probe behavior.
Please add a case that writes with TTL/expire semantics and still asserts
CommittedVersionAtdepends only on exact commit timestamp existence (not current visibility), to pin clock-related behavior explicitly.As per coding guidelines
{store,kv}/**/*_test.go: "Add test coverage for TTL/HLC behaviors when touching clocks, OCC, or replication logic instore/andkv/packages."kv/fsm_onephase_dedup_test.go (1)
40-112: ⚡ Quick winConsolidate the three scenario tests into a table-driven test.
Lines 40-112 are structurally repetitive and can be expressed as one table-driven test with per-case setup/expectations, which will make future dedup scenarios easier to extend and keep consistent.
As per coding guidelines, "
**/*_test.go: Unit tests must be co-located with packages in*_test.gofiles; prefer table-driven test cases."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@kv/fsm_onephase_dedup_test.go` around lines 40 - 112, Replace the three repetitive tests TestOnePhaseDedup_NoOpsWhenPriorAttemptLanded, TestOnePhaseDedup_AppliesWhenPriorAttemptDidNotLand, and TestOnePhaseDedup_FirstAttemptSkipsProbe with a single table-driven test that iterates cases describing the scenario name, initial state (e.g. pre-landed write at commit_ts 20 when needed), the onePhaseReq inputs (start_ts, commit_ts, prev_commit_ts, key, value) and per-case assertions (whether st.CommittedVersionAt should be true at the fresh commit_ts, expected LatestCommitTS value, and expected value via st.GetAt). Use the existing helpers applyFSMRequest and onePhaseReq and instantiate a fresh store.NewMVCCStore() and fsm := NewKvFSMWithHLC(st, NewHLC()).(*kvFSM) inside each subtest; for the "landed" case pre-seed the store by calling applyFSMRequest with the initial attempt as in the old TestOnePhaseDedup_NoOpsWhenPriorAttemptLanded. Ensure each case runs t.Run(case.name, func(t *testing.T){ t.Parallel() ... }) and performs the same checks that the original three tests did.adapter/redis.go (1)
3284-3302: ⚡ Quick winSkip the reuse path for a definitive first-attempt write conflict.
Line 3296 caches
pendingeven when the initialDispatchreturnedstore.ErrWriteConflict. That attempt is already known not to have landed, so the next loop only burns an extra guaranteed-miss probe before eventually recomputing from fresh metadata.♻️ Suggested change
_, dispErr := r.coordinator.Dispatch(ctx, &kv.OperationGroup[kv.OP]{ IsTxn: true, StartTS: startTS, CommitTS: commitTS, Elems: ops, }) if dispErr == nil { newLen = updatedMeta.Len return nil } + if errors.Is(dispErr, store.ErrWriteConflict) { + return errors.WithStack(dispErr) + } // Retryable failure: remember this attempt so the next iteration reuses // its write set (and probes prev_commit_ts) rather than recomputing seq. pending = &reusableListPush{ ops: ops, startTS: startTS,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@adapter/redis.go` around lines 3284 - 3302, The code currently stores a reusableListPush into pending even when r.coordinator.Dispatch returns a definitive store.ErrWriteConflict, causing the next iteration to reuse a write set that is known not to have landed; modify the error-handling after r.coordinator.Dispatch so that if dispErr is the definitive write-conflict (use errors.Is(dispErr, store.ErrWriteConflict) or the appropriate sentinel comparison) you do NOT assign pending = &reusableListPush{...} and simply return the error, while preserving the existing pending caching behavior for other retryable errors; refer to r.coordinator.Dispatch, the pending variable and the reusableListPush struct when making this change.adapter/redis_list_dedup_test.go (2)
205-219: ⚡ Quick winThis case never exercises the disabled retry path.
ambiguousDispatch=0makeslistRPushsucceed on the first dispatch, so this only proves the default-off happy path. It will not catch accidentalPrevCommitTSthreading or write-set reuse when dedup is disabled. Inject one retryable failure here and assert the second attempt recomputes withprobeNoOps == 0.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@adapter/redis_list_dedup_test.go` around lines 205 - 219, The test currently uses newDedupTestCoordinator(st, 0, false) so it never triggers a retry; change the coordinator to inject one retryable/ambiguous failure by calling newDedupTestCoordinator(st, 1, false), then call srv.listRPush as before and update assertions to expect a retry (e.g., coord.dispatches == 2) and assert coord.probeNoOps == 0 to verify the disabled-dedup legacy path recomputes on the second attempt; keep the rest of the test setup (srv, key, values) unchanged and only adjust the coordinator arg and the assertions.
104-219: ⚡ Quick winFold these scenarios into a table-driven test.
The setup and verification shape is nearly identical across all four cases. A case table would remove duplication and make it much easier to add more permutations around
LPUSH, initial write conflicts, or additional retry boundaries.As per coding guidelines,
**/*_test.go: Unit tests should be co-located with packages in*_test.gofiles; prefer table-driven test cases.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@adapter/redis_list_dedup_test.go` around lines 104 - 219, Combine the four TestListPushDedup_* tests into a single table-driven test that iterates cases describing: name, coord setup (newDedupTestCoordinator args like ambiguousDispatch and landed bool), coordinator.beforeDispatch hook (nil or the function that injects the concurrent PutAt for GenuineConflictRecomputes), onePhaseTxnDedup flag on RedisServer, expected outputs (expected length n, coord.dispatches, coord.probeNoOps, and expected list contents/positions checked via resolveListMeta and st.GetAt). Implement a loop that for each case creates st := store.NewMVCCStore(), coord := newDedupTestCoordinator(...), srv := &RedisServer{store:st, coordinator:coord, onePhaseTxnDedup: <case flag>}, runs n, err := srv.listRPush(...), and performs the same assertions currently found in TestListPushDedup_LandedPriorAttempt_NoDuplicate, TestListPushDedup_PriorAttemptDidNotLand_Applies, TestListPushDedup_GenuineConflictRecomputes, and TestListPushDedup_DisabledKeepsLegacyPath, using the case table values to drive expected outcomes and any beforeDispatch behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/design/2026_05_21_proposed_txn_secondary_idempotency.md`:
- Around line 29-33: Update the three fenced code blocks in this document to
include a language identifier (e.g., add "text" after the opening ```),
specifically the block containing the txn trace lines starting with "1411 3
:invoke :txn ..." and the block showing the one-phase apply pseudocode beginning
"one-phase apply of the retry entry E2...", and also the similar fenced block at
lines 229-236; locate these by the exact snippet content and add the language
tag to each opening ``` so MD040 is resolved and rendering/tooling is
consistent.
- Around line 3-17: The document's top header "**Status: proposed — not yet
implemented.**" contradicts later sections that record landed work; change the
status header to reflect partial implementation (e.g., "Status: partially
implemented / in progress") and update the sections that currently assert landed
work so their status markers and tense match (the sections referenced in your
review that record "landed" changes should be annotated as implemented or moved
to a "Completed" subsection); ensure the status string is updated wherever the
original header token appears so the doc is consistent.
In `@kv/coordinator.go`:
- Around line 1021-1023: onePhaseTxnRequest currently hardcodes prevCommitTS to
0 by calling onePhaseTxnRequestWithPrevCommit(..., 0, ...), which drops
PrevCommitTS for single-group dispatch and can reintroduce duplicate writes;
change the helper and call sites to preserve and forward the real prevCommitTS:
update onePhaseTxnRequest to accept a prevCommitTS (or add an overload) and call
onePhaseTxnRequestWithPrevCommit with that value instead of 0, then update
callers in dispatchTxn and buildRedirectRequests to pass the existing
prevCommitTS through so the resulting pb.Request.PrevCommitTS is populated and
the one-phase dedup probe reaches the FSM. Ensure the pb.Request field name
PrevCommitTS (and any local prevCommitTS variables) are consistently used.
---
Nitpick comments:
In `@adapter/redis_list_dedup_test.go`:
- Around line 205-219: The test currently uses newDedupTestCoordinator(st, 0,
false) so it never triggers a retry; change the coordinator to inject one
retryable/ambiguous failure by calling newDedupTestCoordinator(st, 1, false),
then call srv.listRPush as before and update assertions to expect a retry (e.g.,
coord.dispatches == 2) and assert coord.probeNoOps == 0 to verify the
disabled-dedup legacy path recomputes on the second attempt; keep the rest of
the test setup (srv, key, values) unchanged and only adjust the coordinator arg
and the assertions.
- Around line 104-219: Combine the four TestListPushDedup_* tests into a single
table-driven test that iterates cases describing: name, coord setup
(newDedupTestCoordinator args like ambiguousDispatch and landed bool),
coordinator.beforeDispatch hook (nil or the function that injects the concurrent
PutAt for GenuineConflictRecomputes), onePhaseTxnDedup flag on RedisServer,
expected outputs (expected length n, coord.dispatches, coord.probeNoOps, and
expected list contents/positions checked via resolveListMeta and st.GetAt).
Implement a loop that for each case creates st := store.NewMVCCStore(), coord :=
newDedupTestCoordinator(...), srv := &RedisServer{store:st, coordinator:coord,
onePhaseTxnDedup: <case flag>}, runs n, err := srv.listRPush(...), and performs
the same assertions currently found in
TestListPushDedup_LandedPriorAttempt_NoDuplicate,
TestListPushDedup_PriorAttemptDidNotLand_Applies,
TestListPushDedup_GenuineConflictRecomputes, and
TestListPushDedup_DisabledKeepsLegacyPath, using the case table values to drive
expected outcomes and any beforeDispatch behavior.
In `@adapter/redis.go`:
- Around line 3284-3302: The code currently stores a reusableListPush into
pending even when r.coordinator.Dispatch returns a definitive
store.ErrWriteConflict, causing the next iteration to reuse a write set that is
known not to have landed; modify the error-handling after r.coordinator.Dispatch
so that if dispErr is the definitive write-conflict (use errors.Is(dispErr,
store.ErrWriteConflict) or the appropriate sentinel comparison) you do NOT
assign pending = &reusableListPush{...} and simply return the error, while
preserving the existing pending caching behavior for other retryable errors;
refer to r.coordinator.Dispatch, the pending variable and the reusableListPush
struct when making this change.
In `@kv/fsm_onephase_dedup_test.go`:
- Around line 40-112: Replace the three repetitive tests
TestOnePhaseDedup_NoOpsWhenPriorAttemptLanded,
TestOnePhaseDedup_AppliesWhenPriorAttemptDidNotLand, and
TestOnePhaseDedup_FirstAttemptSkipsProbe with a single table-driven test that
iterates cases describing the scenario name, initial state (e.g. pre-landed
write at commit_ts 20 when needed), the onePhaseReq inputs (start_ts, commit_ts,
prev_commit_ts, key, value) and per-case assertions (whether
st.CommittedVersionAt should be true at the fresh commit_ts, expected
LatestCommitTS value, and expected value via st.GetAt). Use the existing helpers
applyFSMRequest and onePhaseReq and instantiate a fresh store.NewMVCCStore() and
fsm := NewKvFSMWithHLC(st, NewHLC()).(*kvFSM) inside each subtest; for the
"landed" case pre-seed the store by calling applyFSMRequest with the initial
attempt as in the old TestOnePhaseDedup_NoOpsWhenPriorAttemptLanded. Ensure each
case runs t.Run(case.name, func(t *testing.T){ t.Parallel() ... }) and performs
the same checks that the original three tests did.
In `@store/committed_version_at_test.go`:
- Around line 19-77: Replace the multiple t.Run subtests in
TestCommittedVersionAt with a single table-driven loop: create a slice of test
cases (fields: name, setup ops as functions or a list describing calls to
newStore/PutAt/DeleteAt, probeKey, probeTS, wantBool) and iterate calling
t.Run(tc.name, func(t *testing.T){ st := newStore(t); perform tc.setup using st
and ctx (use PutAt/DeleteAt as needed); got, err := st.CommittedVersionAt(ctx,
tc.probeKey, tc.probeTS); require.NoError(t, err); require.Equal(t, tc.wantBool,
got) }). Keep references to CommittedVersionAt, PutAt, DeleteAt, newStore and
ctx so each case reproduces the original behaviors (exact hits, absent key,
newer version, tombstone, different key).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 75a9d531-1f65-4482-a64a-fb1155ac75b9
📒 Files selected for processing (16)
adapter/redis.goadapter/redis_list_dedup_test.godocs/design/2026_05_21_proposed_txn_secondary_idempotency.mdkv/coordinator.gokv/fsm.gokv/fsm_onephase_dedup_test.gokv/leader_routed_store.gokv/shard_store.gokv/sharded_coordinator.gokv/transcoder.gokv/txn_codec.gokv/txn_codec_test.gostore/committed_version_at_test.gostore/lsm_store.gostore/mvcc_store.gostore/store.go
…xes (#796 review) Address codex P1 + CodeRabbit Major (both line ~1023 of kv/coordinator.go): the legacy Coordinate.Dispatch path (dispatchTxn at line 465, and leader redirect via buildRedirectRequests at line 957) called bare onePhaseTxnRequest, silently dropping OperationGroup.PrevCommitTS. In a non-sharded deployment or on a leader-redirect, an ambiguous first commit could be retried without the dedup probe key, defeating exactly the path option 2 protects. - Coordinate.dispatchTxn now takes prevCommitTS and forwards it via onePhaseTxnRequestWithPrevCommit. Call site at line 465 passes reqs.PrevCommitTS; the 5 unit-test callers pass 0. - buildRedirectRequests forwards reqs.PrevCommitTS unconditionally — it is the adapter-allocated identity of the prior attempt and is never re-issued by the leader, unlike StartTS/CommitTS. Caller audit confirmed all onePhaseTxnRequest call sites in production now route through the prev-commit-aware path. Lint fixes (reviewdog/golangci): - decodeTxnMetaV2 cyclop 12 -> ~6 by table-driving the optional V2 fields through optionalV2Fields(). - adapter test Dispatch cyclop 12 -> ~7 by extracting the probe to maybeProbe(). Doc nits (CodeRabbit): - Status header upgraded to 'partial — option 2 reader landed everywhere; writer (emission) gated off by default', resolving the conflict with the landed-milestone notes. - All bare code fences carry a language tag (text/go) for MD040.
|
@claude review Round-2 fixes addressing codex P1 + CodeRabbit Major (both flagged the same legacy-Coordinate caller-audit gap) and the two reviewdog cyclop warnings: P1 (codex) / Major (CodeRabbit) —
Cyclop (reviewdog):
Doc nits (CodeRabbit):
Commit: |
|
Claude finished @bootjp's task in 3m 21s —— View job Round-2 Code Review
Verdict: Round-2 changes are correct and the targeted issues are resolved.The P1 (codex) / Major (CodeRabbit) gap — legacy PrevCommitTS Threading (
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fd0831cde7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| _, dispErr := r.coordinator.Dispatch(ctx, &kv.OperationGroup[kv.OP]{ | ||
| IsTxn: true, | ||
| StartTS: pending.startTS, | ||
| CommitTS: commitTS, | ||
| PrevCommitTS: pending.commitTS, | ||
| Elems: pending.ops, | ||
| }) |
There was a problem hiding this comment.
Recompute stale pushes after non-landed retry errors
When onePhaseTxnDedup is enabled and the first push returns a retryable error without landing, this path replays the old item and delta keys after only a PrevCommitTS probe miss. If a pop commits between attempts, it writes a claim/delete plus its own delta and does not necessarily touch the stale push's target sequence; for example, from a one-element list a failed RPUSH plans seq 1, an intervening RPOP removes seq 0, then the replay writes seq 1 with a +1 delta, leaving metadata Head=0 Len=1 so LRANGE scans seq 0 and misses the new item. After a probe miss, the retry needs to recompute or validate the list boundary/read set before applying the captured write set.
Useful? React with 👍 / 👎.
…g pop (#796 codex P1 round-3) codex P1 (06:37, post-fd0831cd): when onePhaseTxnDedup is on and attempt 1 returns retryable without landing, the retry replays the captured ops on top of whatever the world has since become. If an RPOP/LPOP/LTRIM intervened and shrank the boundary, the reused seq can land past the new Tail — physically present but unreachable to LRANGE. Fix: capture the boundary positions (listItemKey(Head), listItemKey(Tail-1)) at attempt 1's meta read and pass them through OperationGroup.ReadKeys to the dispatched one-phase txn. ApplyMutationsRaft validates ReadKeys atomically against startTS under the FSM applyMu, so any intervening boundary-touching commit fires WriteConflict; listPushCoreWithDedup already treats a reuse WriteConflict as 'probe ruled out our prior attempt + seq genuinely contested', clears pending, and recomputes from fresh meta — which is correct because our attempt did not land. Probe-hit branch is unchanged: if our attempt 1 actually committed at T1 the FSM no-ops before the OCC check runs, returning success regardless of what else has happened to the boundary (the post-attempt-1 length we reconstruct is the correct serialized result). The full case matrix is now: - attempt 1 landed, no others → probe hit → no-op (return length) - attempt 1 landed, others touched → probe hit → no-op (return length) - attempt 1 missed, no others → probe miss + ReadKeys OK → apply - attempt 1 missed, boundary touched → probe miss + ReadKeys conflict → recompute - attempt 1 missed, same seq taken → probe miss + write OCC conflict → recompute Also: delete the now-dead onePhaseTxnRequest wrapper (golangci 'unused' lint failure on the fd0831c review run); all production sites moved to onePhaseTxnRequestWithPrevCommit in round-2. Caller audit: no external callers remained. Refine optionalV2Fields comment to match how the table is actually consumed (decode iterates; encode mirrors it via inline conditionals — both must be updated together). Caller audit (loop semantic-change rule): adding ReadKeys to the dedup-path dispatch only widens retry-on-conflict within retryRedisWrite; the client-facing post-push length is unchanged; the legacy gate-off path is untouched. Regression test: TestListPushDedup_InterveningRPopRefusesStaleReuse reproduces the codex scenario from a one-element [A] list and asserts the pushed element is reachable at meta.Head after the fix.
|
@claude review Round-3 fixes addressing codex P1 ( codex P1 — reuse-after-meta-change can hide writes ( Fix: capture the boundary positions ( Full case matrix:
Regression test: Lint / claude review minors:
Caller audit (loop semantic-change rule): adding ReadKeys to the dedup-path dispatch only widens retry-on-conflict within Commit: |
|
Claude finished @bootjp's task in 5m 12s —— View job Round-3 Code Review
Verdict: Round-3 changes are correct. The codex P1 gap is closed.The boundary-fence fix, regression test, dead-code deletion, and comment correction are all sound. Two non-blocking observations follow. Boundary-Fence Correctness (
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 65daa3a2f2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| length: updatedMeta.Len, | ||
| readKeys: boundaryReads, | ||
| } | ||
| return errors.WithStack(dispErr) |
There was a problem hiding this comment.
Keep dedup state for non-retryable ambiguous errors
When the first dispatch returns an ambiguous error outside store.ErrWriteConflict/kv.ErrTxnLocked (for example a caller deadline/cancellation after the proposal may still apply), this path records pending but then returns the original dispErr to retryRedisWrite; adapter/redis_retry.go only retries write conflicts and txn locks, so the saved write set is discarded as the command returns. A client retry then starts a fresh listPushCoreWithDedup with no PrevCommitTS and recomputes a new sequence, leaving the duplicate-list-element case unfixed for those ambiguity modes.
Useful? React with 👍 / 👎.
| // docs/design/2026_05_21_proposed_txn_secondary_idempotency.md). It | ||
| // MUST stay off until every node runs a probe-aware binary — see R5 | ||
| // (FSM determinism across a rolling upgrade). Default off; enabled via | ||
| // WithOnePhaseTxnDedup / the ELASTICKV_REDIS_ONEPHASE_DEDUP env var |
There was a problem hiding this comment.
Wire the advertised env flag into server construction
This says operators can enable the feature with ELASTICKV_REDIS_ONEPHASE_DEDUP, but a repo-wide search shows that string only appears in docs/comments, and NewRedisServer only reads ELASTICKV_REDIS_TRACE; cmd/server/demo.go also never passes WithOnePhaseTxnDedup. In the production server path, setting the documented env var therefore has no effect and the feature remains disabled after rollout.
Useful? React with 👍 / 👎.
…ve on retryable (#796 codex P1+P2 round-4) codex P1 (06:51, adapter/redis.go:3339) — Keep dedup state for non-retryable ambiguous errors: listPushCoreWithDedup recorded 'pending' unconditionally on any first-attempt error, but retryRedisWrite only loops on store.ErrWriteConflict / kv.ErrTxnLocked, so for other errors the saved write set was discarded with the goroutine — misleading and useless. Fix the code's honesty: only set/advance 'pending' when isRetryableRedisTxnErr will actually trigger the loop. Visible behavior is unchanged in the non-retryable case (pending was always going to be lost), but the code no longer pretends to hold state across a request boundary. The broader 'ambiguous error escapes retryRedisWrite then the client retries the command' window (context-deadline / NOTLEADER → fresh listPushCoreWithDedup with no PrevCommitTS) is a separate problem space requiring a cross-request idempotency cache — out of scope for option 2 and noted in the doc as a follow-up. codex P2 (06:51, adapter/redis.go:249) — wire the advertised env flag: ELASTICKV_REDIS_ONEPHASE_DEDUP was documented as an opt-in but NewRedisServer never read it. Add the env-var read in the struct literal next to ELASTICKV_REDIS_TRACE (default off; WithOnePhaseTxnDedup option can still override either way). Caller audit (loop semantic-change rule): - pending-save gating is semantically a no-op (the unsaved state was always discarded by retryRedisWrite returning); no behavior change for non-retryable errors, no caller impact. - env wiring expands the opt-in surface. NewRedisServer is the only production constructor path; tests that bypass it (literal &RedisServer) set onePhaseTxnDedup directly and are unaffected. Also tighten TestListPushDedup_InterveningRPopRefusesStaleReuse's 'dispatches > 2' to '== 3' per claude round-3 minor (pins the exact sequence so a future regression that piles extra dispatches before success doesn't slip past).
|
@claude review Round-4 addressing codex P1 + P2 posted at 06:51 (post-65daa3a2) plus your round-3 minor on the loose dispatch-count assertion. codex P1 — Fix: only set/advance Scope-defining note: the broader window where an ambiguous error escapes codex P2 — env flag advertised but never wired ( Round-3 minor (claude): tightened Caller audit (loop rule):
Commit: |
|
Claude finished @bootjp's task in 3m 22s —— View job Round-4 Code Review
Verdict: Round-4 changes are correct. Both targeted issues are resolved.The pending-save gating and env wiring changes are sound. The dispatch count is correctly tightened. No new issues introduced. P1 — Pending-save gating (
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7781725d38
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Either the prior attempt landed (probe hit → no-op) or this | ||
| // reuse applied the identical write set. The post-push length | ||
| // is the same either way, so reconstruction is just pending.length. | ||
| newLen = pending.length |
There was a problem hiding this comment.
Recompute length after reused push applies
When a saved attempt did not land and another client changes the list length without touching this attempt’s boundary/write keys (for example, an RPUSH gets a retryable pre-apply error, then an LPUSH commits before this reuse), the reused dispatch can apply successfully but this line still returns the length calculated from the first attempt’s old snapshot. The legacy retry path recomputed in this window, so this can now return 2 while the list actually has 3 elements; the retry path needs to either force a recompute/conflict on intervening length-changing writes or distinguish a probe no-op from a fresh apply.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8dec9f60c9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| currentMeta, _, mErr := r.resolveListMeta(ctx, key, r.readTS()) | ||
| if mErr != nil { | ||
| return 0, mErr | ||
| } | ||
| return currentMeta.Len, nil |
There was a problem hiding this comment.
Preserve landed retry length on non-leader probes
When a list-push retry is resolved on a node that is no longer leader, LeaderRoutedStore.CommittedVersionAt now returns (false, nil), so a reuse whose previous attempt actually landed falls through here and returns the leader's current list length. If another client changes the list between the original append and this retry response, RPUSH/LPUSH can return that later length even though this command's element was inserted at the earlier position, which breaks the command's linearizable Redis result. Fresh evidence since the prior comment: the new non-leader false fallback still drives this resolveListMeta branch instead of preserving pending.length for a no-op retry.
Useful? React with 👍 / 👎.
…#796 codex P1 round-7 follow-up) codex P1 round-7 (preserve landed-retry length on non-leader probes): the previous round-7 fix returned (false, nil) unconditionally on a deposed leader, which let resolveReuseLength take the resolveListMeta fallthrough even when our prior attempt had actually committed. The caller then returned the leader's current Len rather than pending.length, which under concurrent writes could include other clients' commits and break the per-our-commit linearizable Redis result. Recover the fast path without adding a new RPC: on the non-leader path, submit a Raft ReadIndex via the existing LinearizableRead primitive (the coordinator's on LeaderRoutedStore; the per-group engine's on ShardStore). ReadIndex forwards to the current leader and waits for our local applied index to reach the leader's commit point — after that, the local probe sees every committed version, so we can answer the exact-ts question authoritatively from the local replica. If ReadIndex fails (no leader, ctx canceled), keep the existing (false, nil) fallback so the adapter re-reads via the already-leader-fenced ScanAt/GetAt and still returns a valid (current-Len) serialization. Cost: one ReadIndex round-trip on the deposed-node retry path. Bounded to a rare situation (retry under leader churn); the leader-with-valid- lease fast path is unchanged. Caller audit (loop semantic-change rule): - adapter/redis.go:3294 (RedisServer.resolveReuseLength): now sees authoritative true/false during churn, so the pending.length branch fires when our prior actually landed. The fallthrough to resolveListMeta still triggers correctly when the probe genuinely misses or ReadIndex fails. - kv/fsm.go:505: still routes through the raw store (not these wrappers), so the deterministic FSM apply path is untouched. All six dedup tests pass; gofmt + vet clean.
|
@claude review Round-8 addressing the new codex P1 at line 3306 (post-round-7): Preserve landed retry length on non-leader probes. Round-7 returned Fix: recover the fast path without a new RPC. On the non-leader path, submit a Raft ReadIndex via the existing Cost: one ReadIndex round-trip on the deposed-node retry path (a rare situation). The leader-with-valid-lease fast path is unchanged. Caller audit (loop rule):
All 6 dedup tests + kv/store suites pass; gofmt + vet clean. Commit: |
|
Claude finished @bootjp's task in 3m 30s —— View job Round-8 Code Review
Verdict: Round-8 changes are correct. The codex P1 "preserve landed retry length on non-leader probes" gap is closed on both wrappers.The ReadIndex-fence approach is the right mechanism and the implementation is sound end-to-end. Three precision notes follow.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bc11779d7e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if _, err := engine.LinearizableRead(ctx); err != nil { | ||
| return false, nil |
There was a problem hiding this comment.
Proxy sharded exact-version probes to the leader
When Redis is backed by ShardStore and a list-push retry is served on a follower or deposed replica, this fallback reports a miss: the etcd raft engine’s LinearizableRead is leader-only for followers/learners, so this branch returns (false, nil) instead of querying the leader for the exact version. After Dispatch has forwarded the retry and the leader no-oped a previously landed attempt, resolveReuseLength falls back to the leader’s current list length; if another client changed the list after the original commit, RPUSH/LPUSH returns that later length rather than this command’s linearized result. Fresh evidence beyond the prior single-group comment is that this new ShardStore.CommittedVersionAt path has the same false fallback for sharded stores.
Useful? React with 👍 / 👎.
| // so that validateReadOnlyShards can issue a linearizable read barrier, | ||
| // preserving SSI. | ||
| return c.dispatchSingleShardTxn(ctx, startTS, commitTS, primaryKey, gids[0], elems, readKeys) | ||
| return c.dispatchSingleShardTxn(ctx, startTS, commitTS, prevCommitTS, primaryKey, gids[0], elems, readKeys) |
There was a problem hiding this comment.
Reject retries that leave the one-phase path
This single-shard branch is the only place prevCommitTS is preserved; if a retry carries PrevCommitTS but its mutations or read keys span shards, execution falls through to the 2PC path (prewriteTxn/commitPrimaryTxn), whose log builders encode only CommitTS and silently drop the dedup probe. In that multi-shard retry scenario a landed ambiguous attempt is treated as an ordinary write conflict, the adapter drops the saved write set and recomputes, which can reintroduce the duplicate push this feature is meant to prevent; if multi-shard retries are out of scope, fail them here instead of accepting PrevCommitTS without honoring it.
Useful? React with 👍 / 👎.
…lse positive (#796 lint) reviewdog/golangci on bc11779 flagged two nilerr violations from the LinearizableRead-fence pattern I introduced in round-8: kv/leader_routed_store.go:226 error is not nil (line 225) but it returns nil kv/shard_store.go:143 error is not nil (line 142) but it returns nil The nilerr signature is 'if err != nil { return nil }'. My intent — treating ReadIndex failure as 'couldn't verify, fall back to the leader-routed slow path' rather than as a real error to surface to the caller — is correct and matches LatestCommitTS's contract, but the linter has no way to see that. Extract small helpers (LeaderRoutedStore.tryLinearizableFence and the package-level tryEngineLinearizableFence) that report ReadIndex success as a bool, then express the call site as 'if !leaderOK && !fenceOK { fall back }'. Same semantics, no error swallowed at a return statement, and the linter is satisfied without a //nolint comment. Behavior unchanged: all six dedup tests pass; gofmt + vet clean.
|
@claude review Round-9 fixing the two The nilerr signature is Fix: extract small helpers ( if !leaderOK && !fenceOK {
return false, nil
}Same semantics, no error swallowed at a return statement, no All 6 dedup tests pass; gofmt + vet clean. Commit: |
|
Claude finished @bootjp's task in 2m 49s —— View job Round-9 Code Review
Verdict: Round-9 changes are correct. Both
|
| State | !leaderOK |
!fenceOK |
Old result | New result |
|---|---|---|---|---|
| Leader, valid lease | false | (short-circuit) | read local | read local ✓ |
| Not leader, ReadIndex succeeds | true | false | read local | read local ✓ |
| Not leader, ReadIndex fails | true | true | (false, nil) |
(false, nil) ✓ |
| Not leader, coordinator nil | true | true (false returned by helper) | (false, nil) |
(false, nil) ✓ |
The coordinator-nil case previously had an explicit early return (if s.coordinator == nil { return false, nil }). In R9, tryLinearizableFence guards s.coordinator == nil → return false → !fenceOK = true → combined condition true → same result. ✓
Nilerr Fix — ShardStore.CommittedVersionAt (kv/shard_store.go:133)
Old R8 pattern (triggered nilerr):
if !isLinearizableRaftLeader(ctx, engine) {
if _, err := engine.LinearizableRead(ctx); err != nil {
return false, nil // ← nilerr
}
}New R9 pattern (clean):
if !isLinearizableRaftLeader(ctx, engine) && !tryEngineLinearizableFence(ctx, engine) {
return false, nil
}Semantically identical under the same case analysis. The isLinearizableRaftLeader short-circuit means no tryEngineLinearizableFence is called when the leader check passes, preserving the R8 cost profile. ✓
Note (non-blocking, pre-existing from R8): For the "leader with expired lease AND first ReadIndex fails" case, isLinearizableRaftLeader internally calls engine.LinearizableRead and fails, then tryEngineLinearizableFence tries a second ReadIndex. This double-ReadIndex was present in R8 and is preserved in R9 — not introduced by the refactor. It's bounded to a narrow failure window (expired lease + unreachable leader).
Helper Design
tryLinearizableFence (method on LeaderRoutedStore) vs tryEngineLinearizableFence (package-level function): The asymmetry is intentional and correct. LeaderRoutedStore holds the Coordinate interface, while ShardStore accesses the raftengine.LeaderView directly. The method form allows the s.coordinator == nil guard to be expressed naturally inside the helper; the package-level form takes the engine as a parameter and guards engine == nil.
Both helpers have clear comment blocks explaining the intentional error-suppression design — this is the key requirement for avoiding a //nolint directive (the intent is readable, the type system expresses it). ✓
Codex P2 from R8 — kv/sharded_coordinator.go:393 (new, not addressed in R9)
The latest codex review (on bc11779) flagged that multi-shard retries with a non-zero PrevCommitTS fall through to the 2PC path (prewriteTxn/commitPrimaryTxn), which doesn't encode PrevCommitTS in the log entries. The dedup probe never reaches the FSM for multi-shard retries.
In practice, the list-push dedup feature operates only on single-key (single-shard) RPUSH/LPUSH commands, so this gap doesn't affect the primary use case. Codex's suggestion — fail multi-shard retries with non-zero PrevCommitTS explicitly rather than silently honoring them without dedup semantics — is a reasonable defensive measure. Worth tracking for a follow-up but not blocking merge.
Carry-over from Prior Rounds (unchanged)
-
Open P1 from Round-6 (
adapter/redis.go,dispatchListPushReuse): self-inflicted reuse conflict — reuse dispatch applies atcommitTS2but returns ambiguous error; next retry seesErrWriteConflictfrom its own committed write, clearspending, and potentially appends a duplicate. The fix is to probeCommittedVersionAt(firstWriteKey(pending.ops), pending.commitTS)before treating a reuse-pathWriteConflictas a genuine third-party conflict. -
CommittedVersionAtretention guard (store/lsm_store.go:778,store/mvcc_store.go:476): Both backends return(false, nil)whencommitTS < minRetainedTSinstead ofErrReadTSCompacted. In the degraded case,resolveReuseLengthgracefully falls through toresolveListMeta. Low urgency while compaction windows far exceed retry windows. -
buildRedirectRequestsunit test: No test verifiesPrevCommitTSsurvives the leader-redirect path with a non-zero value. Code is correct (verified round-2); gap is test coverage only.
Summary: Round-9 is a correct, minimal lint fix. The nilerr violations are resolved by converting the error-used-as-bool pattern into properly-typed bool-returning helpers with clear documentation. No behavior change. The open P1 (self-inflicted reuse conflict), the new codex P2 (multi-shard PrevCommitTS silent drop), and the two carry-over items remain the only open threads before merge.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b4695108a2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return pending.length, nil | ||
| } | ||
| } | ||
| currentMeta, _, mErr := r.resolveListMeta(ctx, key, r.readTS()) |
There was a problem hiding this comment.
Avoid failing after a reuse apply
When the prior attempt did not land and the reused write set applies successfully, this post-commit resolveListMeta can still fail (for example, the list had exactly MaxDeltaScanLimit deltas at the first read and an intervening non-conflicting push pushes it over the truncation limit). That returns an error to the client after the append has already committed; a client retry starts a fresh command with no pending state and can append the same element again. The legacy path did not add a fallible metadata read after a successful dispatch, so the reuse path needs to avoid surfacing this as a failed write or preserve enough idempotency state for the retry.
Useful? React with 👍 / 👎.
Round 10 of /loop review. Addresses 5 outstanding findings from
gh api repos/bootjp/elastickv/pulls/796/comments:
P1 (self-inflicted conflict) — adapter/redis.go:dispatchListPushReuse
When a reuse-path dispatch returns ErrWriteConflict but our own
reused commit_ts already landed (e.g. raft re-applied while caller
saw conflict), surface success instead of bubbling the conflict.
Probe CommittedVersionAt(probeKey, commitTS); if it hits, treat as
no-op and resolve length via the normal reuse path.
P1 (linchpin doc) — docs/design/2026_05_21_proposed_txn_secondary_idempotency.md
Re-prove the "E1 applies before E2 or never" race-freedom argument
under the *cancellable* caller ctx that the current code threads
through (applyRequests(ctx, ...) → Engine.Propose(ctx, ...) with a
live ctx.Done() early-return). Ordering now derives from FIFO
proposeCh + monotonic raft log indices and the fact that retries
are fresh proposals on the new leader, not re-issues — not from
context.Background() blocking.
P2 (retention guard) — store/{lsm_store,mvcc_store}.go:CommittedVersionAt
Return ErrReadTSCompacted distinctly when the probed commit_ts is
below the retention watermark; downstream callers (FSM, adapter)
treat as "couldn't probe, fall back to safe path" — uniform with
every other ErrReadTSCompacted caller in the tree (audited).
P2 (multi-shard reject) — kv/sharded_coordinator.go:dispatchTxn
Fail-closed: return ErrTxnDedupRequiresSingleShard when an
incoming txn request carries PrevCommitTS but its write set spans
shards. The 2PC log builders encode only CommitTS, so honoring
such a request would silently drop the dedup probe at the FSM and
let the original :duplicate-elements anomaly reappear.
P2 (avoid failing after reuse apply) — adapter/redis.go:resolveReuseLength
On reuse, if the post-commit meta-read fails, fall back to
pending.length rather than propagating the read error — the apply
has already landed, so refusing the client response is strictly
worse than reporting the locally-known length.
Caller audit (per /loop semantic-change rule):
- CommittedVersionAt: 5 call sites (FSM, 2× adapter reuse paths,
ShardStore router, LeaderRoutedStore router). All treat
(false, ErrReadTSCompacted) as "fall back to safe path"; uniform.
- ErrTxnDedupRequiresSingleShard: single emit site (multi-shard
reject); surfaced to client as a commit error — fail-closed by
design.
- ErrReadTSCompacted: existing callers (grpc adapter, dualwrite,
s3 adapter, kv compactor) already treat as soft signal. New FSM
+ adapter call sites are consistent.
Tests:
- adapter/redis_list_dedup_test.go: added
TestListPushDedup_SelfInflictedReuseConflict_ReturnsSuccess for
the P1 regression (attempt 1 pre-rejects, reuse lands then errors
with WriteConflict → expect n=1, dispatches=2, probeNoOps=0).
- go test ./adapter/ -run ListPushDedup → 7/7 pass.
- go test ./kv/ ./store/ → both packages pass.
- gofmt + go vet → clean.
|
@claude review Round 10: addressed all 5 outstanding inline findings (commit 71430f2). P1
P2 Caller audit (per the /loop semantic-change rule):
Tests: |
TLA+ spec divergence review (auto-triggered)This PR touches files that the TLA+ safety spec has an anchor on (per Anchored files changed in this PR head (71430f2):
What to check, by subsystem:
If the change is correct but requires a spec update, edit @claude review please verify TLA+ spec divergence per the checklist above. @codex review please verify TLA+ spec divergence per the checklist above. |
|
To use Codex here, create a Codex account and connect to github. |
|
Claude finished @bootjp's task in 5m 48s —— View job Round-10 Code Review
Verdict: Round-10 changes are correct and complete. All 5 open items are resolved.This is the cleanest round yet. Each fix is well-motivated by the prior review thread and each has a precise test that pins the exact invariant. No new issues introduced. One minor observation and two carry-over items remain after this round. P1 — Self-Inflicted Reuse Conflict (
|
| ErrWriteConflict case | Probe result | Action |
|---|---|---|
| Reuse landed at T2 (self-inflicted) | CommittedVersionAt(probeKey, T2) == true |
Update pending.commitTS = T2, call resolveReuseLength, return success |
| Genuine third-party conflict | CommittedVersionAt(probeKey, T2) == false |
Drop pending, recompute |
| Probe itself errors / empty probeKey | — | Same as miss: drop pending, recompute (safe conservative fallback) |
Key design choices verified:
- Probe uses the REUSE's commitTS (T2), not T1: correct. A self-inflicted conflict means our reuse write at T2 is now conflicting with itself, not T1.
pending.commitTS = T2after probe hit: correct. The next retry (if any) must probe T2, since that is now the prior-committed version identity. Without this update, a subsequent retry would probe the wrong point.- Probe-error as miss: correct. If we can't probe T2 (store failure), recomputing is the safe fallback — we'd lose the "remember prior length" optimisation in an extreme scenario, but no duplicate write occurs.
Regression test correctness (TestListPushDedup_SelfInflictedReuseConflict_ReturnsSuccess): the test correctly simulates leadership churn by having the mock coordinator apply the reuse write atomically and then return ErrWriteConflict. The probe of T2 hits (the write is in the store), pending.commitTS is updated, resolveReuseLength returns pending.length == 1, and no recompute dispatch fires (coord.dispatches == 3, not 4). ✓
FSM Handling of ErrReadTSCompacted (kv/fsm.go:506–521)
The three-case switch is the exactly right structure:
switch {
case perr == nil && landed: // no-op: prior committed, skip apply
case perr == nil: // probe miss: fall through to apply
case errors.Is(perr, store.ErrReadTSCompacted): // can't tell: apply (safer than wrong no-op)
default: // unexpected error: surface it
}The ErrReadTSCompacted branch's comment correctly explains the tradeoff:
- Wrong no-op: would silently drop the retry's mutations forever — unacceptable
- Hard failure: aborts every replica deterministically on a benign compaction condition — Raft FSM must be deterministic
- Fall through to apply: a stale extra delta is a strict improvement — the prior round's concern is now explicitly addressed in code
The default branch surfaces unexpected errors via errors.WithStack(perr). This is correct: we only swallow ErrReadTSCompacted because it has a well-understood fallback; unknown errors from the store deserve visibility. ✓
Retention Guard (store/lsm_store.go:774–797, store/mvcc_store.go:476–491)
Both backends:
- Check
readTSCompacted(commitTS, effectiveMinRetainedTS())before the Pebble/in-memory lookup ✓ - Return
(false, ErrReadTSCompacted)on the compacted case ✓
The compaction guard is now surface-uniformly across all timestamped store operations, and callers have explicit handling paths for it at every layer:
- FSM: explicit
errors.Iscase → fall through to apply (correct) resolveReuseLength:defaultcase in switch → returnpending.length(correct)dispatchListPushReuseself-conflict probe: probe-error-as-miss → recompute (correct)LeaderRoutedStore/ShardStorewrapper: wrappers don't check this specifically, they delegate to the raw store — the ErrReadTSCompacted from the raw store propagates up throughLeaderRoutedStore.CommittedVersionAtandShardStore.CommittedVersionAtto the caller above. Verify:ShardStore.CommittedVersionAtreturnsg.Store.CommittedVersionAt(...)when the leader fence passes — this correctly propagates the error. ✓
Multi-Shard PrevCommitTS Reject (kv/sharded_coordinator.go:405–407)
Placement is correct: the guard fires AFTER the single-shard fast path has already returned (line 387), so it is unreachable for any realistic RPUSH/LPUSH command (always single-key, always single-shard). The guard is defensive against future regressions where the dedup opt-in could accidentally be combined with multi-shard write sets.
The comment at lines 396–403 accurately explains why silent drop is wrong: the 2PC log builders encode only CommitTS, so the FSM probe would never fire on the secondary commit path, and the adapter's recompute-on-conflict fallback would re-append the duplicate.
One thing to confirm: ErrTxnDedupRequiresSingleShard is not in isRetryableRedisTxnErr, so it surfaces to the client as a command error (not retried by retryRedisWrite). That is the intended fail-closed behavior. ✓
resolveReuseLength Meta-Read Fallback (adapter/redis.go:3320–3339)
The switch/fallthrough structure is cleaner than the if-chain in prior rounds:
switch {
case perr == nil && hit: → return pending.length (FSM no-op'd at prevCommitTS)
case perr == nil: → fall through to resolveListMeta (FSM applied at fresh ts)
default: → return pending.length (degrade gracefully)
}And then resolveListMeta failure returns pending.length as a second-level fallback. The docblock covers three specific failure modes (ErrReadTSCompacted, other probe errors, MaxDeltaScanLimit overflow on meta-read). The reasoning is sound: surfacing an error AFTER a committed apply would drive the client into a retry with no pending state, defeating the dedup mechanism entirely.
Correctness for the self-inflicted case: after a self-inflicted conflict fix updates pending.commitTS = T2, resolveReuseLength probes T2 (not T1). The probe hits (we just landed at T2), and pending.length (from T1's snapshot) is returned. This is correct: the reuse applied the same ops as T1 would have (same seq numbers), so the post-push length is identical. ✓
Linchpin Doc Race-Freedom Re-Proof
The corrected proof is valid. The key fix: race-freedom was previously attributed to context.Background() preventing Propose from early-returning, which is factually wrong (the actual path threads the caller ctx). The correct invariant chain is:
- Sequential dispatch: adapter blocks on E1's dispatch returning before issuing E2
- FIFO
proposeCh: E1 is enqueued in the channel before E2 (guaranteed by Configure Renovate #1 + single-goroutine dispatch) - Monotonic log indices: Raft assigns indices in dequeue order from
proposeCh - Quorum commitment: if E1 committed on leader L1, its log index propagated to a majority → L2 inherits it at a lower index than E2's fresh-tail assignment
The cancelPendingProposal subtlety is correctly handled: cancellation removes the notification entry but the payload may still append to the log. On retry, the adapter creates a NEW proposal (fresh commitTS, fresh request ID), so there is no risk of "E1 being re-proposed at a higher index than E2." The dedup mechanism handles the case where E1's fate is ambiguous (committed-but-notification-lost). ✓
Minor: dispatchListPushReuse probe-error fallback not commented
When CommittedVersionAt(probeKey, commitTS) returns (_, perr) with perr != nil in the self-inflicted conflict guard (lines 3276–3280), the error is silently treated as a miss — the code falls through to return 0, true, errors.WithStack(dispErr). This is the correct safe fallback (if we can't probe, recompute), but unlike every other probe-error path in this file, there is no comment explaining the intent. A one-line note would help the next reader distinguish intentional silence from missing error handling.
Carry-over from Prior Rounds (unchanged, non-blocking)
buildRedirectRequestsunit test (from R2): no test verifiesPrevCommitTSsurvives the leader-redirect path. Code is correct (verified R2); test coverage gap only.- Double ReadIndex on expired-lease leader path (from R8): when the leader check in
isLinearizableRaftLeaderfails (expired lease) and issues one ReadIndex, thentryEngineLinearizableFenceissues a second one. Bounded to a narrow expired-lease-plus-unreachable-leader window. Pre-existing from R8, not introduced by R10.
Summary: Round-10 is correct and comprehensive. All five open items (self-inflicted reuse conflict, linchpin doc, retention guard, multi-shard reject, meta-read fallback) are addressed cleanly. The FSM's three-case ErrReadTSCompacted switch is the standout piece — the reasoning is correct and the comment explains the tradeoff exactly. Ready to merge with the two non-blocking carry-over items tracked as future work.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 71430f2039
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| case errors.Is(perr, store.ErrReadTSCompacted): | ||
| // The prior attempt's commit_ts is below the retention watermark | ||
| // so we cannot tell "didn't land" from "reclaimed" (codex P2). | ||
| // Fall through to apply normally: a stale extra delta is a |
There was a problem hiding this comment.
Do not branch FSM apply on local compaction
When a retry entry with PrevCommitTS is applied after some replicas have advanced their local retention watermark but others have not, this branch makes compacted replicas fall through and apply at the fresh commitTS while replicas that still retain the previous version return nil and no-op. FSM compaction is driven per runtime from local wall clock/store state rather than by this raft entry, so ErrReadTSCompacted is not a deterministic input to raft apply; this can leave different replicas with different MVCC versions for the same log entry.
Useful? React with 👍 / 👎.
| pending.commitTS = commitTS | ||
| length, lerr := r.resolveReuseLength(ctx, key, pending) |
There was a problem hiding this comment.
Re-read length after fresh self-conflict apply
In the self-inflicted ErrWriteConflict path, setting pending.commitTS to the fresh attempt before calling resolveReuseLength makes the exact-version probe hit and return pending.length, even though this path just proved the reused write set applied at the fresh commitTS rather than no-oping a prior attempt. If the first attempt did not land and another non-conflicting list change committed between attempts (for example an empty-list RPUSH retry with an intervening LPUSH at a different sequence), the actual post-apply length is larger but the client still gets the stale first-attempt length; fresh evidence beyond the earlier length comment is this new self-conflict branch's update of pending.commitTS before reconstruction.
Useful? React with 👍 / 👎.
Round 11 of /loop review on PR #796. Addresses new findings on commit 71430f2 (round-10). codex P1 (kv/fsm.go:514) — Do not branch FSM apply on local compaction Round-10 added an ErrReadTSCompacted fall-through branch in the one-phase dedup probe, citing "stale extra delta is strict improvement". Codex correctly flagged this as non-deterministic across replicas: FSM compaction advances minRetainedTS from local wall clock and per-replica scheduler (kv/compactor.go:safeMinTS), so replicas applying the same log entry at the same log index can disagree on the probe outcome — compacted-replica falls through and applies at the fresh commit_ts, non-compacted replica returns landed and no-ops. Split-brain. Fix: revert the retention guard inside CommittedVersionAt itself (both pebbleStore and mvccStore). The probe is now a single pebble.Get / sorted-slice lookup and never returns ErrReadTSCompacted. For the option-2 use case this is deterministic across replicas — each per-element key carries a single MVCC version that physical compaction does not retire (no superseding version). The cluster-wide invariant retentionWindow > max retry latency keeps the rare never-landed-below-compacted-floor case out of reach. Removed the now-dead ErrReadTSCompacted branch in FSM apply; the helper dedupProbeOnePhase has the simple shape landed → no-op, not-landed → apply, err → propagate. Earlier round-10 P2 finding is explicitly reverted with rationale documented in the store and design doc. ErrReadTSCompacted continues to surface from GetAt/ScanAt/RawGetAt/ RawScanAt (genuine historical reads) where it expresses the truthful "value at this timestamp is no longer recoverable". codex P2 (adapter/redis.go:3278) — Re-read length after fresh self-conflict apply The round-10 self-inflicted-conflict guard set pending.commitTS = commitTS (fresh attempt) before calling resolveReuseLength, which probes at pending.commitTS, hits, and returns pending.length — the previous attempt's projected length. If a non-conflicting list mutation committed between attempts, the actual post-apply length is larger but the client gets the stale prior-attempt value. Fix: split into two helpers. - resolveReuseLength: probe at pending.commitTS; on hit return pending.length (prior attempt landed, no intervening write affected this list). - resolveLengthAfterFreshApply: skip the probe shortcut; always re-read meta. Used by the self-conflict branch where we know the fresh commit_ts landed and pending.length is stale. Both fall back to pending.length on resolveListMeta failure to honor codex P2 round-10 (avoid failing after a reuse apply). reviewdog/golangci-lint failures 1. kv/fsm.go:489 cyclop 13 > 10 — extracted dedupProbeOnePhase helper. Determinism rationale lives at the call site; the helper keeps the surface area small. 2. adapter/redis.go:3320 unparam (resolveReuseLength's error always nil) — dropped the error return; the degraded-fallback contract is now expressed in the signature. 3. adapter/redis.go:3336 nilerr — same fix (signature change), no longer swallows an error explicitly. 4. kv/sharded_coordinator.go:565 cyclop 11 > 10 on dispatchTxn — extracted dispatchMultiShardTxn helper carrying the 2PC branch including the round-10 ErrTxnDedupRequiresSingleShard reject. Caller audit (per /loop semantic-change rule): - CommittedVersionAt: 5 call sites (FSM dedup probe via new helper, adapter dispatchListPushReuse self-conflict guard, adapter resolveReuseLength probe shortcut, ShardStore router, LeaderRoutedStore router). With retention guard removed, no caller sees ErrReadTSCompacted here; existing err != nil paths still fire on genuine pebble/network errors and degrade gracefully. - ErrReadTSCompacted: still emitted by GetAt, ScanAt, RawGetAt, RawScanAt. Existing consumers (adapter/grpc.go, adapter/s3.go, proxy/dualwrite.go, kv/compactor.go) unchanged. - resolveReuseLength: signature changed to drop error return. Sole callers are dispatchListPushReuse (updated) and doc comments in routers; no test mocks return error from this shape. Tests: - go test ./adapter/ -run 'ListPush|Dedup|TestRedis_LPUSH|TestRedis_RPUSH' passes. - go test ./kv/ ./store/ both pass. - gofmt, go vet, golangci-lint run all clean (0 issues across adapter/kv/store packages).
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
TLA+ spec divergence review (auto-triggered)This PR touches files that the TLA+ safety spec has an anchor on (per Anchored files changed in this PR head (04876c0):
What to check, by subsystem:
If the change is correct but requires a spec update, edit @claude review please verify TLA+ spec divergence per the checklist above. @codex review please verify TLA+ spec divergence per the checklist above. |
|
To use Codex here, create a Codex account and connect to github. |
|
@claude review Round 11: addressed the 2 codex findings + 4 reviewdog lint failures on round-10 commit codex P1 — Do not branch FSM apply on local compaction ( Codex correctly flagged that branching FSM apply on Fix: reverted the retention guard inside
codex P2 — Re-read length after fresh self-conflict apply ( The round-10 self-inflicted-conflict guard set Fix: split into two helpers — reviewdog/golangci-lint failures (4)
Caller audit (per /loop semantic-change rule):
Tests: |
|
Claude finished @bootjp's task in 6m 17s —— View job Round-11 Code Review
Verdict: Round-11 changes are correct. All 6 items (2 codex + 4 lint) are resolved.The retention-guard revert is the right call. The self-conflict length split is correct. No new issues introduced. P1 —
|
## Summary Follow-up to #796 (M1 + M2 + M3 RPUSH/LPUSH). Extends option-2 write-set reuse + exact-ts dedup probe to **MULTI/EXEC** dispatched through `runTransaction`. Mirrors `listPushCoreWithDedup` at the EXEC granularity. Gated on the existing `onePhaseTxnDedup` flag — **default off** — so no mixed-version divergence window. This PR is **stacked on #796**: the base branch is `docs/txn-idempotency-design`. Once #796 merges this rebases onto `main`. ## Mechanism 1. **`txnContext.commit()` refactor** — split build-and-dispatch into `prepareDispatch()` returning a `preparedTxnDispatch{elems, commitTS, readKeys, ctx, cancel}`. `commit()` now calls `prepareDispatch()` then dispatches; `runTransactionWithDedup` calls them separately so it can intercept between prepare and dispatch. External behavior of `commit()` is unchanged. 2. **`reusableExecTxn`** — EXEC analogue of `reusableListPush`. The cached `results` array is the M3 R1 result reconstruction: computed once from attempt 1's `startTS` snapshot, returned as-is on any reuse. Same invariance argument as RPUSH/LPUSH `length`. 3. **`dispatchExecReuse`** — one reuse iteration. Dispatches with `PrevCommitTS=pending.commitTS`. On `WriteConflict`, the self-inflicted-conflict guard probes `CommittedVersionAt(probeKey, freshCommitTS)`; if hit, return cached results (codex P1 round-10 class). Otherwise drop pending. 4. **`runTransactionWithDedup` + `firstExecAttempt`** — the option-2 retry loop. First iteration builds the txn and dispatches; retryable failure stashes pending; subsequent iterations call `dispatchExecReuse` with `PrevCommitTS`. 5. **`runTransaction` gate** — when `onePhaseTxnDedup` is on, route to `runTransactionWithDedup`; otherwise keep the legacy retry loop byte-identical. ## Caller audit (per /loop semantic-change rule) - `prepareDispatch` (new): callers are `commit()` and `firstExecAttempt`; both honor the `defer prepared.cancel()` contract. - `commit()`: internal structure changed; external behavior preserved (no test directly invokes it). - `runTransactionWithDedup` / `firstExecAttempt` / `dispatchExecReuse` / `reusableExecTxn`: all new symbols, exercised only from the gated `runTransaction` path. ## Tests (`adapter/redis_exec_dedup_test.go`) | Test | Scenario | Pins | |---|---|---| | `TestExecDedup_LandedPriorAttempt_ReturnsCachedResults` | Attempt 1 lands then errors | Reuse FSM probe hits; cached "OK" returned; value matches | | `TestExecDedup_PriorAttemptDidNotLand_Applies` | Attempt 1 pre-rejects | Reuse applies fresh; results match | | `TestExecDedup_GenuineConflictRebuildsAndApplies` | Concurrent SET advances key past `pending.startTS` | Reuse OCC-conflicts; probe misses; pending dropped; fresh attempt succeeds | | `TestExecDedup_SelfInflictedReuseConflict_ReturnsSuccess` | Reuse lands then surfaces WriteConflict | Self-conflict guard probes fresh `commitTS`; cached results returned (no double-apply) | | `TestExecDedup_DisabledKeepsLegacyPath` | Gate off | No probe; same result as legacy `runTransaction` | ## Validation - `go test ./adapter/ -run 'Txn|MULTI|EXEC|Dedup|TxnStartTS'` passes - `go test ./kv/ ./store/` both pass - `gofmt`, `go vet`, `golangci-lint run` all clean (0 issues across `adapter/kv/store`) ## Scope - **Single-mop EXEC** is the conservative scope per the design doc's "Open questions". The mechanism (cache results array; OCC fence on `readKeys`) works the same for multi-mop EXEC under the existing proof, but the test matrix doubles, so multi-mop validation is a follow-up. - Per-command reconstruction hooks for SET/INCR/HSET etc. when dispatched **outside** MULTI are still a follow-up; those go through individual command paths, not `runTransaction`. ## Design doc reference `docs/design/2026_05_21_proposed_txn_secondary_idempotency.md` §M3.
gemini HIGH (adapter/redis.go:3112 dispatchExecReuse): Clock().Next() bypasses the HLC-4 physical-ceiling fence; persistence-grade commit_ts allocation must use NextFenced() to avoid issuing a timestamp that collides with a subsequent leader's window after renewal. Switched dispatchExecReuse to NextFenced() with the standard error-propagation shape (matches prepareDispatch from PR #884 cherry-pick). gemini MEDIUM (adapter/redis.go:3175 reuseCtx propagation): The original "fresh ctx from handlerContext per reuse attempt" pattern ignored outer cancellation -- a disconnected client would wait the full 10 s before the reuse returned. Derived reuseCtx from dispatchCtx (the caller's retry-loop ctx) so outer cancellation interrupts mid-attempt. Per-attempt redisDispatchTimeout still caps the dispatch the same way commit does for the first attempt. Matches listPushCoreWithDedup's caller-ctx threading. gemini MEDIUM (adapter/redis.go:3215 readPin.Release on nil): False positive. ActiveTimestampToken.Release is documented nil-safe at kv/active_timestamp_tracker.go:58. The pattern readPin := r.pinReadTS(...); defer readPin.Release() is used unchanged in the existing runTransaction (legacy path); no change needed. Caller audit per /loop semantic-change rule: - dispatchExecReuse signature unchanged; the new error return is the same (bool, error) tuple position. Single caller is runTransactionWithDedup which already returns dispErr via dropping to the retry loop's error path -- the new NextFenced error reaches the same dispErr branch. - reuseCtx parent change is local to runTransactionWithDedup; the only observable effect is faster cancellation propagation, which retryRedisWrite already gates on ctx.Done between attempts. Note on pre-existing Clock().Next() callers: dispatchListPushReuse (adapter/redis.go:3508) and the listPushCoreWithDedup first-attempt site (adapter/redis.go:3679) -- both shipped on main as part of PR #796 -- still use Clock().Next() and have the same HLC-4 ceiling-bypass exposure. Out of scope for this PR (those are pre-existing on main, not introduced by PR-A's diff), but should be fixed in a follow-up cleanup PR for parity. Design doc: M3 "fresh reuseCtx from handlerContext" deviation note is struck and replaced with the dispatchCtx-derived rationale; future readers see why the earlier framing was wrong and what the current pattern is. Validation: go test ./adapter/ -run 'ExecDedup|TxnMULTI' passes. go build ./adapter/... clean. golangci-lint run ./adapter/... 0 issues.
…i-mop (#887) ## Summary PR #884 was merged into the stacked branch `docs/txn-idempotency-design` (at `cbbde3d7`) but never reached `main` — main has only PR #796's M1 + M2 + M3 RPUSH/LPUSH content. This PR re-lands the M3 EXEC reuse code on top of main and bundles three follow-ups that move the design doc's "still open" / "follow-up" items into "landed". Base: `main`. ## Changes ### 1. Re-land M3 EXEC reuse (PR #884's content) - `adapter/redis.go`: `txnContext.prepareDispatch()` split out of `commit()`; `reusableExecTxn`; `dispatchExecReuse`; `runTransactionWithDedup` + `firstExecAttempt`; gate at the top of `runTransaction`. - `adapter/redis_exec_dedup_test.go` (originally added in PR #884): 5 tests pinning all four reuse outcomes plus the gate-off legacy equivalence. - Cherry-pick adaptation: `prepareDispatch()` uses `Clock().NextFenced() (uint64, error)` on current main; the PR #884 version targeted `Clock().Next() (uint64)`. Same downstream semantics, error wired through `preparedTxnDispatch`. ### 2. Close M2 open item — FSM other-txn exactness test `TestOnePhaseDedup_OtherTxnVersionDoesNotMaskRetry` (`kv/fsm_onephase_dedup_test.go`) pins exactness at the FSM apply layer: a third-party version at `T_other=20` must not satisfy the FSM probe at `T1=30`, so the retry falls through and applies at the fresh `T2=40`. The store-layer pin already covers the primitive; this test covers the dispatch path that uses it. ### 3. M3 multi-mop EXEC dedup test `TestExecDedup_MultiMopLandedPriorAttempt_ReturnsCachedResults` extends single-mop dedup to a 3-command MULTI/EXEC body (`SET a + SET b + DEL c`). Validates that cached results + OCC `readKeys` fence are mop-count-agnostic. Without dedup the DEL would re-execute to 0 on the second pass — the test rejects that. ### 4. Design doc updates - §M2 "still open" → "LANDED" with the new FSM test reference - §M3 "runTransaction (MULTI/EXEC) — Still open" → "LANDED via PR #884" with multi-mop test reference and acknowledgement of two intentional deviations from the M1/M2 template that claude[bot] flagged on #884 (`readKeys` assembly order, fresh per-attempt `reuseCtx`) - §M3 "standalone SET/INCR/HSET" called out as PR-B follow-up ## Caller audit (per /loop semantic-change rule) - `prepareDispatch` (newly added): callers are `commit()` and `firstExecAttempt`; both honor `defer prepared.cancel()`. External behavior of `commit()` preserved. - `commit()`: internal structure changed; external behavior preserved (no test directly invokes it). - `runTransactionWithDedup` / `firstExecAttempt` / `dispatchExecReuse` / `reusableExecTxn`: all new symbols, exercised only from the gated `runTransaction` path. ## Validation - `go test ./adapter/ -run 'Dedup|Txn|MULTI|EXEC'` passes - `go test ./kv/ ./store/` both pass - `gofmt`, `go vet`, `golangci-lint run` all clean (0 issues) ## Relation to prior work | PR | Merged into | Content | |---|---|---| | #796 (`f481f2b7`) | main | M1 + M2 + M3 RPUSH/LPUSH | | #884 (`cbbde3d7`) | stacked branch, NOT main | M3 EXEC reuse (stranded) | | **This PR** | main | Re-land #884 + M2 cross-txn FSM test + multi-mop test + doc updates | | PR-B (next) | main | Standalone SET / INCR / HSET reuse paths | | PR-C (next) | main | M4 Jepsen validation infrastructure | <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **Tests** * Added comprehensive regression test suite for transaction retry and deduplication handling, covering cached result reuse, conflict detection, and legacy behavior paths * Extended test coverage for standalone command transaction consistency * **Documentation** * Updated design documentation to reflect transaction handling improvements and recent test coverage additions <!-- review_stack_entry_start --> [](https://app.coderabbit.ai/change-stack/bootjp/elastickv/pull/887?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) <!-- review_stack_entry_end --> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
Design doc for fixing the
:duplicate-elementsanomaly surfaced by scheduled Jepsen run 26198185540.What's in the doc
:append 14 230from worker 3 produced TWO storage entries with value230, immediately visible in the next read. Same-commit_tsreplay is dedup'd by content-addressed list-delta keys; different-commit_tsreplay (regenerated byrunTransaction'sretryRedisWriteon partial-commit) is NOT, because RPUSH/LPUSH derive theirseqfrom a fresh meta read at attempt time.CheckTxnStatusbefore retry, LockResolver — and which one elastickv is missing.CheckTxnStatus) as a follow-up.Why now / why this layer
:duplicate-elementsis now visible to the checker rather than being silently classified as:info.runTransaction's retry loop AND the gRPCForwardredirect, and no client-supplied txn ID exists. Server-side marker is the natural enforcement point.applyCommitWithIdempotencyFallbackonly catches same-commit_tsreplay; the regenerated-input replay shape needs a logical-txn-identity marker.Companion PR
PR #795 ("upload full demo cluster log on Jepsen failure") is open in parallel. The design here is independent of which exact code path produces the partial-commit (that's what PR #795 will let us pinpoint); the fix works for any such path.
Status
Open for review. Implementation milestones (M1–M6) follow once this lands.
Summary by CodeRabbit
New Features
Tests
Documentation